38 research outputs found

    Platform independent profiling of a QCD code

    Get PDF
    The supercomputing platforms available for high performance computing based research evolve at a great rate. However, this rapid development of novel technologies requires constant adaptations and optimizations of the existing codes for each new machine architecture. In such context, minimizing time of efficiently porting the code on a new platform is of crucial importance. A possible solution for this common challenge is to use simulations of the application that can assist in detecting performance bottlenecks. Due to prohibitive costs of classical cycle-accurate simulators, coarse-grain simulations are more suitable for large parallel and distributed systems. We present a procedure of implementing the profiling for openQCD code [1] through simulation, which will enable the global reduction of the cost of profiling and optimizing this code commonly used in the lattice QCD community. Our approach is based on well-known SimGrid simulator [2], which allows for fast and accurate performance predictions of HPC codes. Additionally, accurate estimations of the program behavior on some future machines, not yet accessible to us, are anticipated

    Effective Reproducible Research with Org-Mode and Git

    Get PDF
    International audienceIn this article we address the question of developing a lightweight and effective workflow for conducting experimental research on modern parallel computer systems in a reproducible way. Our workflowsimply builds on two well-known tools (Org-mode and Git) and enablesto address issues such as provenance tracking, experimental setup reconstruction, replicable analysis. Although this workflow is perfectible and cannot be seen as a final solution, we have been usingit for two years now and we have recently published a fully reproduciblearticle, which demonstrates the effectiveness of our proposal

    Towards Modeling and Simulation of Exascale Computing Platforms

    Get PDF
    National audienceFuture super-computer platforms will be facing big challenges due tothe enormous power consumption. One possible solution to this problemwould be to develop HPC systems from today’s energy-efficient hardwaresolutions used in embedded and mobile devices like ARM. However, ARMchips have never been used in HPC programing before, leading to a numberof significant challenges. Therefore, we experimented with ARM processors and compared their performance with the architectures that are knownbetter, in this case last generations of Intel processors. Due to the memorybottleneck of most scientific applications, understanding the performanceof CPU caches in this context is crucial, thus this research was investigat-ing the processor performance depending on memory hierarchy. We presentnot only differences and complexity of these two architectures, but also howchanging seemingly innocuous aspects of an experimental setup can causecompletely distinctive behavior. Additionally, we demonstrate very cleanand systematic methodology, which aid us in achieving good performanceestimations

    Towards Modeling and Simulation of Exascale Computing Platforms

    No full text
    National audienceFuture super-computer platforms will be facing big challenges due tothe enormous power consumption. One possible solution to this problemwould be to develop HPC systems from today’s energy-efficient hardwaresolutions used in embedded and mobile devices like ARM. However, ARMchips have never been used in HPC programing before, leading to a numberof significant challenges. Therefore, we experimented with ARM processors and compared their performance with the architectures that are knownbetter, in this case last generations of Intel processors. Due to the memorybottleneck of most scientific applications, understanding the performanceof CPU caches in this context is crucial, thus this research was investigat-ing the processor performance depending on memory hierarchy. We presentnot only differences and complexity of these two architectures, but also howchanging seemingly innocuous aspects of an experimental setup can causecompletely distinctive behavior. Additionally, we demonstrate very cleanand systematic methodology, which aid us in achieving good performanceestimations

    Reproducible and User-Controlled Software Environments in HPC with Guix

    Get PDF
    Support teams of high-performance computing (HPC) systems often find themselves between a rock and a hard place: on one hand, they understandably administrate these large systems in a conservative way, but on the other hand, they try to satisfy their users by deploying up-to-date tool chains as well as libraries and scientific software. HPC system users often have no guarantee that they will be able to reproduce results at a later point in time, even on the same system-software may have been upgraded, removed, or recompiled under their feet, and they have little hope of being able to reproduce the same software environment elsewhere. We present GNU Guix and the functional package management paradigm and show how it can improve reproducibility and sharing among researchers with representative use cases.Comment: 2nd International Workshop on Reproducibility in Parallel Computing (RepPar), Aug 2015, Vienne, Austria. http://reppar.org

    Writing a Reproducible Article

    Get PDF
    International audienceNous avons récemment soumis à Europar notre premier article dont l'analyse est reproductible de bout en bout. L'objectif de cette intervention est d'expliquer comment nous avons procédé et de discuter sur la généralisation possible de cette approche à d'autres cas d'étude. Cet article porte sur la validation d'un modèle permettant de simuler StarPU, un runtime à base de tâches pour architectures hybrides, à l'aide de SimGrid. Pour valider ce modèle, un nombre conséquent d'expériences sur des architectures variées a du être réalisé. L'ensemble des traces des exécutions réalisées sur ces différentes plates-formes ainsi que l'ensemble des informations sur leur provenance et nécessaires à leur réalisation (versions des codes, type de machine, OS, options de compilation, ...) ont été consignées systématiquement grâce à une utilisation combinée de git et d'org-mode dans l'équivalent d'un cahier de laboratoire. Il a alors été très simple de rédiger un article sur le même principe. Concrètement, les traces ont été migrées sur figshare et les sources du document contiennent l'intégralité des scripts nécessaires à la réalisation de l'analyse. La compilation du document commence par télécharger l'ensemble des traces en local avant d'en extraire les informations importantes pour l'analyse et de générer les courbes figurant dans l'article. Il devient donc possible pour quiconque de partir d'un graphique et de remonter jusqu'aux informations sur les conditions expérimentales sous-jacentes

    Writing a Reproducible Article

    Get PDF
    International audienceNous avons récemment soumis à Europar notre premier article dont l'analyse est reproductible de bout en bout. L'objectif de cette intervention est d'expliquer comment nous avons procédé et de discuter sur la généralisation possible de cette approche à d'autres cas d'étude. Cet article porte sur la validation d'un modèle permettant de simuler StarPU, un runtime à base de tâches pour architectures hybrides, à l'aide de SimGrid. Pour valider ce modèle, un nombre conséquent d'expériences sur des architectures variées a du être réalisé. L'ensemble des traces des exécutions réalisées sur ces différentes plates-formes ainsi que l'ensemble des informations sur leur provenance et nécessaires à leur réalisation (versions des codes, type de machine, OS, options de compilation, ...) ont été consignées systématiquement grâce à une utilisation combinée de git et d'org-mode dans l'équivalent d'un cahier de laboratoire. Il a alors été très simple de rédiger un article sur le même principe. Concrètement, les traces ont été migrées sur figshare et les sources du document contiennent l'intégralité des scripts nécessaires à la réalisation de l'analyse. La compilation du document commence par télécharger l'ensemble des traces en local avant d'en extraire les informations importantes pour l'analyse et de générer les courbes figurant dans l'article. Il devient donc possible pour quiconque de partir d'un graphique et de remonter jusqu'aux informations sur les conditions expérimentales sous-jacentes

    Platform independent profiling of a QCD code

    Get PDF
    International audienceThe supercomputing platforms available for high performance computing based research evolve at a great rate. However, this rapid development of novel technologies requires constant adaptations and optimizations of the existing codes for each new machine architecture. In such context, minimizing time of efficiently porting the code on a new platform is of crucial importance. A possible solution for this common challenge is to use simulations of the application that can assist in detecting performance bottlenecks. Due to prohibitive costs of classical cycle-accurate simulators, coarse-grain simulations are more suitable for large parallel and distributed systems. We present a procedure of implementing the profiling for openQCD code [1] through simulation, which will enable the global reduction of the cost of profiling and optimizing this code commonly used in the lattice QCD community. Our approach is based on well-known SimGrid simulator [2], which allows for fast and accurate performance predictions of HPC codes. Additionally, accurate estimations of the program behavior on some future machines, not yet accessible to us, are anticipated

    Faithful Performance Prediction of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures

    Get PDF
    International audienceSUMMARY Multi-core architectures comprising several GPUs have become mainstream in the field of High-Performance Computing. However, obtaining the maximum performance of such heterogeneous machines is challenging as it requires to carefully offload computations and manage data movements between the different processing units. The most promising and successful approaches so far build on task-based runtimes that abstract the machine and rely on opportunistic scheduling algorithms. As a consequence, the problem gets shifted to choosing the task granularity, task graph structure, and optimizing the scheduling strategies. Trying different combinations of these different alternatives is also itself a challenge. Indeed, getting accurate measurements requires reserving the target system for the whole duration of experiments. Furthermore, observations are limited to the few available systems at hand and may be difficult to generalize. In this article, we show how we crafted a coarse-grain hybrid simulation/emulation of StarPU, a dynamic runtime for hybrid architectures, over SimGrid, a versatile simulator of distributed systems. This approach allows to obtain performance predictions of classical dense linear algebra kernels accurate within a few percents and in a matter of seconds, which allows both runtime and application designers to quickly decide which optimization to enable or whether it is worth investing in higher-end GPUs or not. Additionally, it allows to conduct robust and extensive scheduling studies in a controlled environment whose characteristics are very close to real platforms while having reproducible behavior
    corecore